医学图像中的血管分割是诊断血管疾病和治疗计划的重要任务之一。尽管已经对基于学习的细分方法进行了广泛的研究,但在有监督的方法中需要大量的基础真实标签,并且令人困惑的背景结构使神经网络难以以无监督的方式分割血管。为了解决这个问题,在这里,我们介绍了一种新型的扩散对抗表示学习(DARL)模型,该模型利用具有对抗性学习的降解扩散概率模型,并将其应用于血管分割。特别是,对于自我监管的血管分割,Darl使用扩散模块学习背景图像分布,该模块使生成模块有效地提供了容器表示。同样,通过基于提议的可切换在空间自适应的否定规范化的对抗学习,我们的模型估计了合成的假船只图像以及船舶分割掩码,这进一步使模型捕获了辅助血管的语义信息。一旦训练了提出的模型,该模型就会生成一个步骤,并可以应用于冠状动脉血管造影和视网膜图像的一般血管结构分割。各种数据集的实验结果表明,我们的方法在船舶分割中的现有无监督和自我监督方法的表现显着胜过。
translated by 谷歌翻译
最近的几种少数学习算法中的大多数都是基于转移学习,其中模型是使用大量源数据进行预训练的,并且随后使用少量目标数据更新了预训练的模型。在基于转移的几次学习中,已经广泛研究了复杂的预训练方法,以进行通用和改进的表示。但是,几乎没有关于更新预训练模型以进行几次学习的研究。在本文中,我们比较了两种流行的更新方法,即微调(即更新整个网络)和线性探测(即仅更新线性分类器),考虑了源数据和目标数据之间的分布变化。我们发现,随着样品数量的增加,无论分布变化如何,微型调整都比线性探测更好。接下来,我们研究了对预训练模型进行微调时,数据增强的有效性和无效性。我们的基本分析表明,需要仔细考虑有关更新预训练模型的详细信息,才能获得更好的射击性能。
translated by 谷歌翻译
胃内窥镜筛查是在早期决定适当的胃癌(GC)治疗的有效方法,从而降低了与GC相关的死亡率。尽管人工智能(AI)带来了一个巨大的希望,可以帮助病理学家筛选数字化整个幻灯片图像,但现有的AI系统受到细粒癌症亚赛的限制,在计划癌症治疗方面几乎没有可用性。我们提出了一个实用的AI系统,该系统可以实现五个GC病理的亚分类,可以直接与一般的GC治疗指南相匹配。 AI系统旨在通过模仿人类病理学家理解组织学的方式,通过使用2阶段混合视觉变压器(VIT)网络通过多尺度的自我注意力转换器(VIT)网络通过多尺度的自我发项机制来有效区分多级GC。 AI系统通过在多中心队列中达到1,212张幻灯片,通过达到高于0.85的类平均灵敏度来显示可靠的诊断性能。此外,与人类病理学家相比,AI辅助病理学家显示出12%的诊断敏感性显着提高了12%。我们的结果表明,在实际临床环境中,AI辅助胃内窥镜筛查具有提供假定的病理学意见和适当的胃癌癌症治疗的巨大潜力。
translated by 谷歌翻译
由于细分标签稀缺,已经进行了广泛的研究,以培训具有域名适应性,半监督或自制学习技术来利用丰富的未标记数据集的分割网络。但是,这些方法彼此不同,因此尚不清楚如何将这些方法组合起来以提高性能。受到最新的多域图像翻译方法的启发,我们在这里提出了一个新颖的分割框架,使用自适应实例归一化(ADAIN),以便对单个发电机进行培训,以通过简单地通过更改任务来通过知识蒸馏来执行域的适应性和半手不足的细分任务 - 特定的AD代码。具体而言,我们的框架旨在处理胸部X射线射线照片(CXR)细分中的困难情况,其中标签仅适用于正常数据,但训练有素的模型应应用于正常数据和异常数据。提出的网络在域移动下显示出极大的概括性,并实现了异常CXR分割的最新性能。
translated by 谷歌翻译
Generative AI has matured to a point where large-scale models can generate text that seems indistinguishable from human-written text and remarkably photorealistic images. Automatically measuring how close the distribution of generated data is to the target real data distribution is a key step in diagnosing existing models and developing better models. We present MAUVE, a family of comparison measures between pairs of distributions such as those encountered in the generative modeling of text or images. These scores are statistical summaries of divergence frontiers capturing two types of errors in generative modeling. We explore four approaches to statistically estimate these scores: vector quantization, non-parametric estimation, classifier-based estimation, and parametric Gaussian approximations. We provide statistical bounds for the vector quantization approach. Empirically, we find that the proposed scores paired with a range of $f$-divergences and statistical estimation methods can quantify the gaps between the distributions of human-written text and those of modern neural language models by correlating with human judgments and identifying known properties of the generated texts. We conclude the paper by demonstrating its applications to other AI domains and discussing practical recommendations.
translated by 谷歌翻译
We study model-based reinforcement learning (RL) for episodic Markov decision processes (MDP) whose transition probability is parametrized by an unknown transition core with features of state and action. Despite much recent progress in analyzing algorithms in the linear MDP setting, the understanding of more general transition models is very restrictive. In this paper, we establish a provably efficient RL algorithm for the MDP whose state transition is given by a multinomial logistic model. To balance the exploration-exploitation trade-off, we propose an upper confidence bound-based algorithm. We show that our proposed algorithm achieves $\tilde{\mathcal{O}}(d \sqrt{H^3 T})$ regret bound where $d$ is the dimension of the transition core, $H$ is the horizon, and $T$ is the total number of steps. To the best of our knowledge, this is the first model-based RL algorithm with multinomial logistic function approximation with provable guarantees. We also comprehensively evaluate our proposed algorithm numerically and show that it consistently outperforms the existing methods, hence achieving both provable efficiency and practical superior performance.
translated by 谷歌翻译
Knowledge tracing (KT) aims to leverage students' learning histories to estimate their mastery levels on a set of pre-defined skills, based on which the corresponding future performance can be accurately predicted. In practice, a student's learning history comprises answers to sets of massed questions, each known as a session, rather than merely being a sequence of independent answers. Theoretically, within and across these sessions, students' learning dynamics can be very different. Therefore, how to effectively model the dynamics of students' knowledge states within and across the sessions is crucial for handling the KT problem. Most existing KT models treat student's learning records as a single continuing sequence, without capturing the sessional shift of students' knowledge state. To address the above issue, we propose a novel hierarchical transformer model, named HiTSKT, comprises an interaction(-level) encoder to capture the knowledge a student acquires within a session, and a session(-level) encoder to summarise acquired knowledge across the past sessions. To predict an interaction in the current session, a knowledge retriever integrates the summarised past-session knowledge with the previous interactions' information into proper knowledge representations. These representations are then used to compute the student's current knowledge state. Additionally, to model the student's long-term forgetting behaviour across the sessions, a power-law-decay attention mechanism is designed and deployed in the session encoder, allowing it to emphasize more on the recent sessions. Extensive experiments on three public datasets demonstrate that HiTSKT achieves new state-of-the-art performance on all the datasets compared with six state-of-the-art KT models.
translated by 谷歌翻译
This work presents a detailed linguistic analysis into why larger Transformer-based pre-trained language models with more parameters and lower perplexity nonetheless yield surprisal estimates that are less predictive of human reading times. First, regression analyses show a strictly monotonic, positive log-linear relationship between perplexity and fit to reading times for the more recently released five GPT-Neo variants and eight OPT variants on two separate datasets, replicating earlier results limited to just GPT-2 (Oh et al., 2022). Subsequently, analysis of residual errors reveals a systematic deviation of the larger variants, such as underpredicting reading times of named entities and making compensatory overpredictions for reading times of function words such as modals and conjunctions. These results suggest that the propensity of larger Transformer-based models to 'memorize' sequences during training makes their surprisal estimates diverge from humanlike expectations, which warrants caution in using pre-trained language models to study human language processing.
translated by 谷歌翻译
Generalisation to unseen contexts remains a challenge for embodied navigation agents. In the context of semantic audio-visual navigation (SAVi) tasks, the notion of generalisation should include both generalising to unseen indoor visual scenes as well as generalising to unheard sounding objects. However, previous SAVi task definitions do not include evaluation conditions on truly novel sounding objects, resorting instead to evaluating agents on unheard sound clips of known objects; meanwhile, previous SAVi methods do not include explicit mechanisms for incorporating domain knowledge about object and region semantics. These weaknesses limit the development and assessment of models' abilities to generalise their learned experience. In this work, we introduce the use of knowledge-driven scene priors in the semantic audio-visual embodied navigation task: we combine semantic information from our novel knowledge graph that encodes object-region relations, spatial knowledge from dual Graph Encoder Networks, and background knowledge from a series of pre-training tasks -- all within a reinforcement learning framework for audio-visual navigation. We also define a new audio-visual navigation sub-task, where agents are evaluated on novel sounding objects, as opposed to unheard clips of known objects. We show improvements over strong baselines in generalisation to unseen regions and novel sounding objects, within the Habitat-Matterport3D simulation environment, under the SoundSpaces task.
translated by 谷歌翻译
Transformer-based large language models are trained to make predictions about the next word by aggregating representations of previous tokens through their self-attention mechanism. In the field of cognitive modeling, such attention patterns have recently been interpreted as embodying the process of cue-based retrieval, in which attention over multiple targets is taken to generate interference and latency during retrieval. Under this framework, this work first defines an entropy-based predictor that quantifies the diffuseness of self-attention, as well as distance-based predictors that capture the incremental change in attention patterns across timesteps. Moreover, following recent studies that question the informativeness of attention weights, we also experiment with alternative methods for incorporating vector norms into attention weights. Regression experiments using predictors calculated from the GPT-2 language model show that these predictors deliver a substantially better fit to held-out self-paced reading and eye-tracking data over a rigorous baseline including GPT-2 surprisal. Additionally, the distance-based predictors generally demonstrated higher predictive power, with effect sizes of up to 6.59 ms per standard deviation on self-paced reading times (compared to 2.82 ms for surprisal) and 1.05 ms per standard deviation on eye-gaze durations (compared to 3.81 ms for surprisal).
translated by 谷歌翻译